Accessibility percolation on n-trees 
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Abstract -Accessibility percolation is a new type of percolation problem inspired by evolutionary 
biology. To each vertex of a graph a random number is assigned and a path through the graph 
is called accessible if all numbers along the path are in ascending order. For the case when the 
random variables are independent and identically distributed, we derive an asymptotically exact 
expression for the probability that there is at least one accessible path from the root to the leaves 
in an n-tree. This probability tends to 1 (0) if the branching number is increased with the height 
of the tree faster (slower) than linearly. When the random variables are biased such that the 
mean value increases linearly with the distance from the root of the tree, a percolation threshold 
emerges at a finite value of the bias. 



Introduction and outline. — Percolation theory in 
its modern form was introduced by Broadbcnt and Ham- 
mersley in 1957 pQ. Since then percolation has become a 
cornerstone of probability theory and statistical physics, 
with applications ranging from molecular dynamics to star 
formation [2j[3] , and new variants of the problem continue 
to attract much attention (e.g., [1H7]). Standard perco- 
lation theory is concerned with the loss of global connec- 
tivity in a graph when vertices or bonds are randomly re- 
moved, as quantified by the probability for the existence 
of an infinite cluster of contiguous vertices. 

Here we consider a novel kind of percolation problem 
inspired by evolutionary biology. Imagine a population of 
some lifeform endowed with the same genetic type (geno- 
type). If a mutation occurs, a new genotype is created 
which can die out or replace the old one. Provided natu- 
ral selection is sufficiently strong, the latter only happens 
if the new genotype has larger fitness. As a consequence, 
on longer timescales the genotype of the population takes 
a path through the space of genotypes along which the fit- 
ness is monotonically increasing [8]. Such a path is called 
selectively accessible [MTU]. 

Since the relationship between genotype and fitness is 
very complicated and largely unknown, it is natural to 
assign fitness values to genotypes in a random way. Evo- 
lutionary accessibility thus becomes a statistical property 
of random fitness landscapes [TT1 - IT3] . In recent years it has 
become possible to experimentally determine fitness land- 



scapes comprising small numbers of genetic loci [ 1 Oil 14111 5J , 
and the study of evolutionary accessibility is an impor- 
tant tool for characterizing and interpreting such data sets 

PHQ2]. 

On an abstract level, the problem of interest is defined 
as follows: Consider a graph G where a continuous random 
variable w(a) is assigned to each vertex a. A path of 
contiguous vertices 



0-2 



through the graph is called accessible if 

w(ai) < w(a 2 ) < . . . < w(a n ). 

Accessibility percolation studies the statistics of such ac- 
cessible paths, specifically the probability for the existence 
of paths that span the entire graph. An important differ- 
ence to standard percolation is that accessibility depends 
on the discrete gradient of a globally defined fitness func- 
tion w: G h->- M, rather than on local random variables 
|13j . To see this, it is useful to formulate the problem as 
a kind of bond percolation on directed graphs, where the 
edge from a to a' is removed if w(o~) > w{a') and a path 
is called accessible if all edges are present along the path. 
In the simple case when the w(a) are chosen as indepen- 
dent and identically distributed (i.i.d.) random variables, 
this means that a given edge is removed with probability 
1/2, but in contrast to standard bond percolation the re- 
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Fig. 1: An n-tree with branching number n = 3 and height 
h = 2. 

moval of different edges is correlated through the fitness 
function. 

In the biological context the natural choice for the un- 
derlying graph G is the L-dimensional hypercube, where 
L is the number of binary genetic loci [TTirTTHTS] . The 
topology of the hypercube is rather peculiar in that the 
diameter of the graph (as measured by the length of the 
longest path) is equal to its dimension (as measured by 
the coordination number of a vertex). Recent numerical 
and analytic work has shown that accessibility percolation 
on the hypercube with i.i.d. random variables is critical, 
in the sense that minor changes of the model can trigger 
a transition from low to high accessibility jl2[[20] . 

In order to further explore the role of graph topology 
in the phenomenology of accessibility percolation, we here 
consider the problem on n-trecs. The simple structure of 
the trees allows us to obtain precise analytic bounds on 
the probability of existence of accessible paths. Moreover, 
by scaling the branching number of the tree with its height 
we are able to mimic the structure of the hypercube and 
place previous results [TJJdO] mt o a broader context. 

An ?i-tree or complete n-ary tree is a rooted tree where 
each vertex is connected to n child vertices, except for the 
leaves which have no children. The height h is defined as 
the distance from the root to the leaves, i.e., any path from 
the root to a leaf consists of h + 1 vertices (cf. fig.Q]). This 
graph is very similar to graphs known as Cayley trees or 
Bethe lattices, though these are usually defined such that 
each vertex has the same coordination number rather than 
the same number of child vertices, i.e., the root would have 
n + 1 child vertices. 

Our results are as follows. First we assume that the ran- 
dom variables w{a) are drawn independently from a single 
continuous distribution, a setting known in the evolution- 
ary context as the House of Cards (HoC) model [T7 l l2"T j . 



As accessibility is determined solely by the ordering of 
the random variables on the tree, in this case the results 
are independent of the choice of the underlying proba- 
bility distribution. We will give an exact expression for 
the second moment of the number N of accessible paths 
from the root to the leaves. Furthermore, we will show 
that the probability P(N > 1) that there is at least one 
accessible path is asymptotically equivalent to n h /hi for 
large h. This result is valid if n is constant as well as if 
n = n(h) is a function that grows more slowly than linear 
with h. If the growth is faster than linear, the probability 
P(iV > 1) tends to 1 for h — > oo. For a linearly growing 
function n(h) = ah there is a percolation threshold, i.e., 
F(N > 1) -> for a < a c and lim/,-^ ¥{N > 1) > for 
a > a c with a c € [e , 1]. 

We then consider the effect of a bias that increases the 
mean fitness linearly with the distance from the root, cor- 
responding to the Rough Mount Fuji (RMF) model of 
fitness landscapes in evolutionary biology fT2"IrT()ll2"2"Il2l5] . 
Specifically, we set w(a) = x a ~ 9d(a), where d(a) is the 
distance to the closest leaf, 8 > is the bias parameter, 
and the x a are i.i.d. random variables which (for technical 
reasons to be explained below) are drawn from a Gumbel 
probability density function. We show that this model 
displays a percolation threshold at a nonzero value 9 C of 
the bias, such that lim^oo F(N > 1) = for 6 < 8 C and 
lim^oo F(N > 1) > for 6 > 6 C . 

Basic properties of n-trees. — Let us recall some 
basic properties of n-trees and their accessibility. The 
number of leaves is given by n . Since each leaf cor- 
responds uniquely to a path, this is also the number of 
paths. In previous work on accessibility percolation on 
the hypercube, it has usually been assumed that the value 
of the destination vertex is the global fitness maximum 
[THUD]- Analogously we will assume here that the start- 
ing vertex, i.e., the root, is the global minimum. For the 
case of i.i.d. random variables w(cr), there are then hi dif- 
ferent equally likely orderings of the values encountered 
along a path, and hence the probability that a given path 
is accessible is I /hi. By linearity the first moment of the 
number N of accessible paths follows immediately (12) . 



Another useful property of n-trees is their recursive struc- 
ture. A tree of height h + 1 can be thought of as a root 
which is connected to n trees of height h. This property 
enables us to give an explicit but recursive equation for the 
probability of having no paths from the root to a leaf. Let 
us denote the probability that there is no path in a tree of 
height h + 1, given that the root has the value w(0) = x, 
by Qh{x). A tree of height h + 1 is not accessible if each 
vertex a with distance 1 to the root has either a value 
w(a) < x or it has a larger value but the subtree of height 
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h with a as the root is not accessible. This leads to 



Qi(x) =F (x) r 



fh+l 



(x) 



F h (x)+ h(y)Q h (y)dy 



(2) 



where fd{x) and F^x) are the probability density function 
and cumulative distribution function, respectively, of the 
values w(a) belonging to the vertices which have distance 
d to the closest leaf. For future reference we allow these 
distributions to depend explicitly on d. The probability of 
having at least one accessible path in a tree of height h is 
then given by 



P(A > 1) 



1 



fh(x) Qh(x) dx. 



(3) 



So far we did not find a closed analytic solution of the re- 
currence relation @ . However, the equation can be solved 
numerically by approximating the integral by the trape- 
zoidal rule. In the following we will use this to support 
and supplement our analytic results, which are based on 
the analysis of the moments of N. 

Calculation of the second moment. To compute 
the second moment of N we define indicator variables 0,; 



for each path i <E {1, 2, 



'}by 



Si 



1, if the i-th path is accessible, 
0, else. 



(4) 



The Gj are dependent random variables but identically 
distributed with 

(e l > = (e.?) = p(e l = i) = i. 

This yields TV = Y^7=i ®» an( ^ hence 

<iv 2 ) = x;<e?> + e <©a-> = w + x <©a-> • 



(5) 



Note that the correlator (QiQj) depends only on the num- 
ber of vertices which the i-th and j-th path have in com- 
mon. For any two paths i and j (with i ^ j) which share 
h — k + 1 vertices, is given by the probability TTk 

that both paths are accessible. If the value w(a) of the 
vertex a where both paths diverge is given by x, the two 
paths can be decomposed into three independent subpaths 
(cf. fig. 12). All vertices which are closer to the root than a 
must have a smaller value than x and all vertices closer to 
the leaves must have a larger value. Additionally, all val- 
ues have to be in ascending order on each subpath. This 
yields 



„/i-fc-i 



o (h-k-iy. 

B(h-k,2k + l) _ 
(h-k-l)\k\ 2 " 



(l-x) h 



fc! 

2fc\ 1 
fc ) (h + k)\ 



dx 



(6) 




Fig. 2: The correlation between two paths only depends on the 
number h — k + 1 of vertices both paths have in common. 



where B(x,y) is Eulcr's beta function. To evaluate the 
sum over the correlators in eqn. ([5]) one also needs the 
number rrik of pairs of paths that have h — k + 1 common 
vertices. This number can be evaluated with a simple 
combinatorial consideration: For the first path, say the 
left one in fig. [2] any leaf can be chosen, i.e., there are n h 
possibilities. The second path shares ft, — fc + 1 vertices, 
then there are n—l potential child vertices to choose from 
(if one takes the n-th vertex, it would also belong to the 
first path) and finally one can choose any child vertex 
until one reaches a second leaf which gives another n k ~ 1 
possibilities. Altogether there arc = (n — l)r7, ft,+fc_1 
different pairs. This yields 



(A^ 2 ) = (N)+J2^m k 
fc=i 

h 

(N) 



i-l /2fc\ 
fc=i v 7 

h-l 

<(A) + (A) 2 +]r 



2fc\ n h+k 



k=l 



(h + k)\ 



,h+k 



k J (h + k)\ 



(7) 



(8) 



Probability of having accessible paths. In or- 
der to gain information about the probability for the exis- 
tence of accessible paths, we will use the first- and second 
moment method, i.e., we will apply the following lemma 
20 , 2|j : Let A be a random variable which takes only in- 
teger values {0, 1, 2, . . .} and has a finite second moment. 
Then the inequality 



(X) > P(X > 1) > 



(xy 

{X* 



(9) 



holds true. By combining this with eqns. (TTJ and 
obtains the inequality 



(NY 



(N) > V(N > 1) > — 

" (N) + (N) 2 + S(h) 



where 



h-l 



S(h) = E 



k=l 



,h+k 



2k 



k J (h + k)\ 



(10) 



(11) 
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We will treat the branching number n as a function of h. 
Since we are going to distinguish the three cases of linear 
growth, growth faster than linear and growth slower than 
linear, we make the ansatz 



n = n(h) = h a(h) , 



(12) 



The three cases correspond to a(h) = a = const., a(h) — > 
oo and a{h) — > 0, respectively. Furthermore, we will as- 
sume that n(h + 1) > n(h). Using Stirling's formula we 
obtain for the mean value of N 




[ea(h)] f 



(13) 



This expression goes to zero if a(h) -> or a(h) = a < 
e _1 and diverges if a{h) — > oo or a(h) = a > e _1 . 

First we consider the case a(h) — > and show that S(h) 
grows more slowly than (N). Stirling's formula applied to 
eqn. (fTTj) leads to 



h-l 



s(h) < J2 



^k gfe+fc+1 n k+h 



-{ 2tt yjitk{k + h) (k + h) 



k+h 



< 



^Mh + Y){h+l) h e 



h-X 

h E 

k=\ 



< (N) x ■ 



n h-l 



with x 



4en 
h+1 

Aen 
h+1 



Since x — > for large h also S(h) (N)^ 1 tends to zero. This 
implies according to eqn. (fTUf that P(A^ > 1) is asymptoti- 
cally equivalent to (N) , and both quantities vanish asymp- 
totically for large h (cf. fig. [3]). 
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Fig. 3: The probability P(A r > 1) as a function of h for con- 
stant branching numbers n. Solid lines represent the asymp- 
totic expression P(iV > 1) ~ (iV). Symbols correspond to the 
numerical solution of the recurrence relation 

The case a(h) -> oo is a bit more complicated. Sim- 
ilar to the previous case, we will show that S(h) grows 
more slowly than (N) , which implies that P(iV > 1) — > 1 



for h — > oo. Instead of using Stirling's formula we will 
estimate the function £(/i) = S(h) (N) h 2 recursively: 



t(h+i)= (h+1)[2 y( 2k ) n{h+1)k 

U j nih + l) 1 ^ 1 ^ \k) {h + k + 1)\ 



< 



(h + iy 



(h + lf 



< 



(2h+l)n(h + l) (h + 2)n(h + l) 
a(h) 



(14) 



Since a{h) — > oo, we can fix any constant D, > 1 and find 
ho such that a(h) > fl for all h > ho. Then it can be 
shown by induction that 



Qh-h 



h — ho 

E 

fc=i 



J_ hi 



i 



n - i 



and hence £,(h) — > since f2 can become arbitrarily large. 

Finally we consider the case a(h) = a = const. Using 
the recurrence relation (|14|) , it can be shown by induction 
that £(h) < (a — 1) , given that a > 1. Hence the limit 
of £(h) is finite and according to eqn. (fT0|) and (fT3|) we get 



lim P(N > 1 




for a < e 1 
for a > 1 . 



It is reasonable to assume that the limit is monotonic in 
a. Therefore, there must be a threshold a c in [e , 1] such 
that P(7V > 1) -)• for a < a c and lim^oo P(A^ > 1) > 
for a > a c . 

Effect of a bias. — In this section we assume that the 
fitness values w(a), rather than being i.i.d., are distributed 
according to f(x + d{a)9) where d(a) is the distance from 
a to the closest leaf and f(x) = exp(— x — exp(— x)) is the 
probability density of the Gumbel distribution. The rea- 
son for this choice is that for this distribution the probabil- 
ity for the random variables along a path to be in ascend- 
ing order (the 'ordering probability') is known in closed 
form j23j. Using this result it follows that the probability 
for a path of length k directed towards the leaves to be 
accessible is 



P k {O) = a(0) k b(0,k), 



(15) 



where 



a{6) = l-e~ 9 and b(6, k) 



(16) 



Note that b(9, k) is monotonically increasing with k and 
converges for k — > oo. This leads to the bounds 



1 < b(6,k) < lim b(6,k) 

k— >OG 



b{6) 



(17) 
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with 



which finally leads to 



b(0) 



exp 



7T 

60 



24 



(18) 



In the following we will show that lim P(iV > 1) = 

h— >oo 

for a(9) < n- 1 and lim P(A > 1) > for a(9) > n~ l , 

h— >oo 

and therefore the percolation threshold occurs at 9 C = 
ln(n) -ln(n-l). 

First let 9 < 6 C , i.e., a{6) < rT 1 . Then according to 
cqn. © we get 

P(A > 1) < (N) = P h+1 (9)n h < a(6)b(0) [a(6)n] h -> 
for h — > oo. 

0.5 
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Fig. 4: Probability of finding accessible paths in a 4-tree as 
a function of the bias 6. The critical bias is given by 8 C — 
ln(4/3) ~ 0.288. Symbols represent the numerical solution of 
eqn. the lower bound is given by eqn. (|19|l . 

Now let 9 > 9 C . In order to apply eqn. ^ one needs 
to compute the second moment which in turn requires 
the calculation of the probability itk that two paths that 
share h — k + 1 vertices are both accessible. A necessary 
condition for this is that the h — k + 1 common vertices 
are accessible as well as both separated subpaths which 
consist of k vertices each (cf. fig. [2]). Therefore 7Tfc < 
iV fe+ i(0) Pk{9) 2 and hence 

n h h h 

=J2 mk7rk ^J2 m k Ph-k+i{9) P k {9) 2 



k=l 



k=l 



h+k—1 „fa\h+k+l 



a(9) h+k+i b(9,h - k + l)b(9,kY 



fe=l 



After using the upper bound for b(9,k) from eqn. (fT7| . 
introducing y — na(9) > 1 and evaluating the geometric 
sum one gets 



< 



K9f y h+2 {y h - 1) 



¥{N > 1) > 



{NY 



(N) 



> 



(iV 2 > 1 + {N) ^ji^i) 
a(9)b(9,h + l)y h 



l + 6(0)2(^-l)y(y-l)-l 

h^oo a(6*) y - l 



6(0) y 



> 0. 



(19) 



The lower bound proves that there is a threshold, but the 
numerical solution of eqn. © indicates that it is not a 
good estimate for the exact value of P(A^ > 1) (cf. fig. @|. 
Nevertheless the transition is clearly continuous, in quali- 
tative agreement with the behavior of the bound (fT^|) . 

Note that the threshold criterion a(9 c ) = n^ 1 applies 
whenever the probability that a path of length k is accessi- 
ble can be written as in eqn. (|T5|) . where b(9, k) is bounded 
by two positive functions of 9, i.e. c\{9) > b(9 7 k) > 
c 2(9) > for all k and 9 > 0. In particular, for b(9, k) = 1 
and a(9) = 9 it reduces to the case of standard percola- 
tion where vertices are removed with probability 1 — 9 and 
a path is accessible if all vertices are present. This leads 
directly to the well-known result 9 C — n" 1 for standard 
percolation in n-trees [2] . A bound on of the form (p~5|) 
has been obtained in [20] for RMF models with a fairly 
general class of probability distributions, indicating that 
the behavior described here is not restricted to the Gum- 
bel distribution. 

Conclusions. — In this paper we studied evolution- 
ary accessibility as a new type of percolation problem and 
considered the statistics of accessible paths on trees. For 
the case of i.i.d. random variables, we presented a method 
for the exact calculation of the second moment (A 2 ) of 
the number of accessible paths. Based on this result, we 
could show by the second moment method that the prob- 
ability P(A > 1) of finding accessible paths is asymptoti- 
cally equivalent to the mean value (N) for large heights h 
and constant branching number n. When n is scaled with 
h, we show that F(N > 1) goes to or 1 depending on 
whether n(h) grows more slowly or faster than linearly. In 
case of a linear scaling n oc h the behavior depends on the 
constant of proportionality a. We can show that the limit 
of ¥(N > 1) is positive if a > 1 and zero if a < e _1 , but 
our method fails for values in between. 

The tree with linear scaling n = ah is of particular 
interest, as it approximates the topology of the hypercube. 
There are different ways of relating the two graphs that 
lead to different values of the prefactor a. Observing that 
the coordination number of the hypercube is equal to its 
diameter one could set n = h and hence a = 1. On the 
other hand, the total number of pathways traversing a 
hypercube of size h is h\ ~ (h/e) , which can be equated 



to the number of paths on the tree, 



to yield a 



e . In either case, the comparison to the tree shows that 
the hypercube is poised near the threshold between high 
and low accessibility. This is consistent with the rigorous 
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analysis of [5D], which shows that the HoC model on the 
hypercube undergoes an abrupt transition from P(N > 
1) ps 1 to P(N > 1) ps as the fitness value of the starting 
vertex is increased. 

The different topological structure of hypcrcubes and 
trees also manifests itself in the response to a bias 9 ap- 
plied to the random variables along the paths. On the 
hypercube any finite bias 8 > leads to high accessibility 
¥(N > 1) ps 1 [T21I20] . while on the tree with fixed branch- 
ing number n a continuous percolation transition occurs 
at a threshold value 8 C > 0. Although the explicit calcu- 
lation of the threshold was restricted here to the case of a 
Gumbel distribution, wc expect this behavior to apply for 
a large class of distributions (see [201123] ). 

Our results as well as the inequality (|9]) indicate that 
the probability of having at least one accessible path is, 
in general, closely related to their mean number. The left 
part of the inequality implies that the probability goes 
to zero if the mean does. If the mean diverges, a non- 
vanishing probability for accessible paths results provided 
the ratio (N 2 )/(N) 2 remains bounded as the graph size 
tends to infinity. This was the case in the examples con- 
sidered here, but it need not always be true. Explicit coun- 
terexamples are the HoC model on the hypercube without 
conditions on the starting vertex, for which (TV) = 1 and 
¥(N > 1) ->■ [HHO], and the block model of protein 
evolution [25], for which (N) — s- oo and P(iV > 1) — >• as 
the hypercube dimension L — > oo [25]. It is clear from © 
that this behavior requires very large fluctuations in N, 
and it will be interesting to further explore these fluctua- 
tions and their impact on accessibility. 

Obviously, accessibility percolation can also be studied 
on other graphs. But on most structures which are not 
tree-like it is quite complicated to calculate the number 
of paths between two points which is necessary to calcu- 
late the second moment and to apply the inequality (|9|). 
Nevertheless the problem could be analyzed numerically 
or with other analytical methods. 
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